R, Programming assignment

Simulating pathways, mutual exclusivity, et al.

Authors:

Raquel Blanco Martínez-Illescas, Daniel Peñas Utrilla, Henry Secaira Morocho

Bioinformatics and Computational Biology, UAM

27/01/2021

Introduction

Cancer Progression Models

Cancer is an heterogeneous disease caused by the continuous accumulation of somatic mutations during lifetime of an individual. Somatic mutations that affect the cells are divided in:

  • Passenger mutations --> Silent mutations
  • Driver mutations --> Lead to cancer progression

Cancer Progression Models (cont.)

  • Tumor samples are obtained as cross-sectional samples --> single-time snapshots from tumors
  • Not all possible order in mutations are equally responsible for cancer progression --> infer order of accumulation of mutations leading to cancer progression. Order in accumulation of mutations is covered by Cancer Progression Models (CPM). They are depicted as Directed Acyclic Graphs (DAG), where nodes represent genes and arrows dependencies between them

Cristea et al., 2017

Evolutionary Models

  • Order of restrictions are encoded in DAGs

  • Mutations occur only if the preceding parent mutations have occurred (monotonicity)

  • Genotypes can follow different paths during disease progression

  • Evolutionary tumor progression models allows deviations from monotonicity

Diaz-Uriarte, R. (2018)

Evolutionary Models (cont.)

  • Fitness landscapes can be used to:
    • Visualize the possible paths of tumor progression
    • Identify genes that block those paths



Diaz-Uriarte, R. (2018)

Order of effects

  • The order in which somatic mutations are acquired influence clonal evolution

    • Mutations can behave as driver or passenger depending on the genetic context
  • Therefore, the fitness a mutations depends in which mutations were acquired previously



Epistatic interactions

  • Cancer progression is driven by the accumulation of somatic mutations that interact epistatically

  • This is, their effect is non-additive to the tumor fitness as a phenotype

Synthetic Viability

  • The combination of two mutations that rescue the lethal effects of a each single mutation

Mutual Exclusivity

  • Common in cancer progresion: pathways
  • Null effect: a mutation that occurs first produces the most selective advantage

Mutual Exclusivity (cont.)

  • Synthetic lethality: combination of two mutations is detrimental for the viability of the cell, whereas each mutation is not

Frequency-dependent fitness

  • Coexistance of multiple clones in time

  • Evolutionary game theory

  • Fitness is actually a function of the relative frequency of other clones

PathTiMEx, a generative probabilistic graphical model of cancer progression.

PathTiMex is a probabilistic model of tumor progression among mutually exclusive driver pathways. This model was used to infer cancer progression in colorectal cancer.

Cristea et al., 2017

This generative model is mapped into an evolutionary model, where deviations from monotonicity are allowed.

Model

In [4]:
## Restriction table (extended version of the poset)
colcancer <- data.frame(
                 parent = c(rep("Root",3), "A", "B", "C"), ## Parent nodes
                 child = c("A", "B", "D", "C", "E", "E"), ## Child nodes
                 s = c(0.5, 0.2, 0.05, 0.1, rep(0.05, 2)), ## Restrictions are satisfied
                 sh = -0.3, ## Deviations from monotonicity (penalization)
                 typeDep = "MN" ## Type of dependency 
                  )
## Fitness specification of the poset
colcancer_efec <- allFitnessEffects(
                  colcancer, # Poset
                  geneToModule = c( ## Specification of the modules
                               "Root" = "Root", 
                               "A" = "APC",
                               "B" = "TP53, EVC2",
                               "C" = "KRAS",
                               "D" = "PI3KCA, EPHA",
                               "E" = "FBXW7, TCF7L2"),
                  drvNames = c( ## Specification of drivers
                               "APC", "TP53", "EVC2", "KRAS",
                               "PI3KCA", "EPHA", "FBXW7", "TCF7L2"))
In [3]:
## DAG representation
plot(colcancer_efec, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [5]:
# Evaluation of all genotypes and its fitness
colcancer_efec_FL <- evalAllGenotypes(colcancer_efec, max = 110000)
## Plot of fitness landscape
plotFitnessLandscape(colcancer_efec_FL)

Order Effects in cancer progression

Simplified model with 4 genes: APC, TP53, FBXW7 and KRAS.

In [4]:
# DAG from the simplified model
plot(cc_visuali, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Order Effects in cancer progression (cont.)

Mutation's order is inferred with pathTiMEx through the waiting time rate parameter. It informs about the mutation's arise. Fitness value associated to each order is derived from this parameter.

Gene/module Waiting time rate parameter
APC 9.5
KRAS 2.89
TP53, EVC2 1.92
PIK3CA, EPHA3 0.17
FBXW7, TCF7L2 0.08

The order in mutations: APC > TP53 > KRAS > FBXW7 is consistent with restrictions and is associated to the highest fitness value. Different order deviates from monotoniciy and lower fitness is associated.

Order Effects in cancer progression (cont.)

In [5]:
## Order effects
cc_order <- allFitnessEffects(
  orderEffects = c("A > B > C > D" = 0.5, 
                   "B > A > C > D" = 0.2,
                   "B > C > A > D" = 0.1,
                   "B > C > D > A" = 0.05,
                   "A > C" = 0.2,
                   "C > A" = 0.05,
                   "D > A" = 0.05,
                   "A > D" = 0.2,
                   "B > D" = 0.2,
                   "C > D" = 0.2,
                   "B > C" = 0.2,
                   "C > B" = 0.1,
                   "B > A" = 0.1,
                   "A > B" = 0.3),

  geneToModule =
    c("A" = "APC",
      "B" = "KRAS",
      "C" = "TP53",
      "D" = "FBXW7") )

Map Genotypes to Fitness

In [6]:
## Fitness associated to each genotype
(cc_order_geno <- evalAllGenotypes(cc_order, order = TRUE))
GenotypeFitness
APC 1,0000
FBXW7 1,0000
KRAS 1,0000
TP53 1,0000
APC > FBXW7 1,2000
APC > KRAS 1,3000
APC > TP53 1,2000
FBXW7 > APC 1,0500
FBXW7 > KRAS 1,0000
FBXW7 > TP53 1,0000
KRAS > APC 1,1000
KRAS > FBXW7 1,2000
KRAS > TP53 1,2000
TP53 > APC 1,0500
TP53 > FBXW7 1,2000
TP53 > KRAS 1,1000
APC > FBXW7 > KRAS 1,5600
APC > FBXW7 > TP53 1,4400
APC > KRAS > FBXW7 1,8720
APC > KRAS > TP53 1,8720
APC > TP53 > FBXW7 1,7280
APC > TP53 > KRAS 1,7160
FBXW7 > APC > KRAS 1,3650
FBXW7 > APC > TP53 1,2600
FBXW7 > KRAS > APC 1,1550
FBXW7 > KRAS > TP531,2000
FBXW7 > TP53 > APC 1,1025
FBXW7 > TP53 > KRAS1,1000
KRAS > APC > FBXW7 1,5840
KRAS > APC > TP53 1,5840
......
TP53 > APC > FBXW7 1,512000
TP53 > APC > KRAS 1,501500
TP53 > FBXW7 > APC 1,323000
TP53 > FBXW7 > KRAS 1,320000
TP53 > KRAS > APC 1,270500
TP53 > KRAS > FBXW7 1,584000
APC > FBXW7 > KRAS > TP532,246400
APC > FBXW7 > TP53 > KRAS2,059200
APC > KRAS > FBXW7 > TP532,695680
APC > KRAS > TP53 > FBXW74,852224
APC > TP53 > FBXW7 > KRAS2,471040
APC > TP53 > KRAS > FBXW72,965248
FBXW7 > APC > KRAS > TP531,965600
FBXW7 > APC > TP53 > KRAS1,801800
FBXW7 > KRAS > APC > TP531,663200
FBXW7 > KRAS > TP53 > APC1,455300
FBXW7 > TP53 > APC > KRAS1,576575
FBXW7 > TP53 > KRAS > APC1,334025
KRAS > APC > FBXW7 > TP532,280960
KRAS > APC > TP53 > FBXW73,284582
KRAS > FBXW7 > APC > TP531,995840
KRAS > FBXW7 > TP53 > APC1,746360
KRAS > TP53 > APC > FBXW72,634509
KRAS > TP53 > FBXW7 > APC2,200414
TP53 > APC > FBXW7 > KRAS2,162160
TP53 > APC > KRAS > FBXW72,594592
TP53 > FBXW7 > APC > KRAS1,891890
TP53 > FBXW7 > KRAS > APC1,600830
TP53 > KRAS > APC > FBXW72,195424
TP53 > KRAS > FBXW7 > APC1,920996

Map Genotypes to Fitness (cont.)

In [7]:
#DAG
plot(cc_order)
Error in `*tmp*`[[i]]: subíndice fuera de  los límites
Traceback:

1. plot(cc_order)
2. plot.fitnessEffects(cc_order)
3. plot(g1, edgeAttrs = list(arrowsize = a1, lty = s1, lwd = lwd, 
 .     color = c1), nodeAttrs = nAttrs)
4. plot(g1, edgeAttrs = list(arrowsize = a1, lty = s1, lwd = lwd, 
 .     color = c1), nodeAttrs = nAttrs)
5. .local(x, y, ...)
6. agopen(x, name = name, layout = TRUE, layoutType = y, attrs = attrs, 
 .     nodeAttrs = nodeAttrs, edgeAttrs = edgeAttrs, subGList = subGList, 
 .     recipEdges = recipEdges)
In [8]:
# Fitness landscape
plotFitnessLandscape(cc_order_geno)
Error in to_Fitness_Matrix(x, max_num_genotypes = max_num_genotypes): We cannot deal with order effects yet.
Traceback:

1. plotFitnessLandscape(cc_order_geno)
2. to_Fitness_Matrix(x, max_num_genotypes = max_num_genotypes)
3. stop("We cannot deal with order effects yet.")

Pathway Linear Progression Model

  • Generative probabilistic model

  • The order in which mutations arise are better described at the pathway level instead of a gene level

  • Model mapped the progression model from colorectal cancer data

Model

In [12]:
## Define poset restrictions, mapping of genes to modules, and driver genes
(CRC_W <- allFitnessEffects(data.frame(parent = c("Root", "A", "B", "C"),
                                      child = c("A", "B", "C", "D"),
                                      s = c(0.6, 0.4, 0.1, 0.05),
                                      sh = -0.5,
                                      typeDep = "MN"), 
                           geneToModule = c("Root" = "Root",
                                            "A" = "APC, EPHA3, TCF7L2",
                                            "B" = "EVC2, PIK3CA, TP53", 
                                            "C" = "KRAS",
                                            "D" = "FBXW7"),
                           drvNames = c("APC", "EPHA3", "TCF7L2", "EVC2", "PIK3CA", 
                                        "TP53", "KRAS", "FBXW7")))
$long.rt
$long.rt$A
$long.rt$A$child
[1] "A"

$long.rt$A$s
[1] 0.6

$long.rt$A$sh
[1] -0.5

$long.rt$A$typeDep
        MN 
"monotone" 

$long.rt$A$parents
[1] "Root"

$long.rt$A$childNumID
A 
1 

$long.rt$A$parentsNumID
Root 
   0 


$long.rt$B
$long.rt$B$child
[1] "B"

$long.rt$B$s
[1] 0.4

$long.rt$B$sh
[1] -0.5

$long.rt$B$typeDep
        MN 
"monotone" 

$long.rt$B$parents
[1] "A"

$long.rt$B$childNumID
B 
2 

$long.rt$B$parentsNumID
A 
1 


$long.rt$C
$long.rt$C$child
[1] "C"

$long.rt$C$s
[1] 0.1

$long.rt$C$sh
[1] -0.5

$long.rt$C$typeDep
        MN 
"monotone" 

$long.rt$C$parents
[1] "B"

$long.rt$C$childNumID
C 
3 

$long.rt$C$parentsNumID
B 
2 


$long.rt$D
$long.rt$D$child
[1] "D"

$long.rt$D$s
[1] 0.05

$long.rt$D$sh
[1] -0.5

$long.rt$D$typeDep
        MN 
"monotone" 

$long.rt$D$parents
[1] "C"

$long.rt$D$childNumID
D 
4 

$long.rt$D$parentsNumID
C 
3 



$long.epistasis
list()

$long.orderEffects
list()

$long.geneNoInt
data frame with 0 columns and 0 rows

$geneModule
    Gene Module GeneNumID ModuleNumID
1   Root   Root         0           0
2    APC      A         1           1
3  EPHA3      A         2           1
4   EVC2      B         3           2
5  FBXW7      D         4           4
6   KRAS      C         5           3
7 PIK3CA      B         6           2
8 TCF7L2      A         7           1
9   TP53      B         8           2

$gMOneToOne
[1] FALSE

$geneToModule
                Root                    A                    B 
              "Root" "APC, EPHA3, TCF7L2" "EVC2, PIK3CA, TP53" 
                   C                    D 
              "KRAS"              "FBXW7" 

$graph
IGRAPH c356bc7 DN-- 5 4 -- 
+ attr: name (v/c), typeDep (e/c), color (e/c), lty (e/n), arrow.mode
| (e/n)
+ edges from c356bc7 (vertex names):
[1] Root->A A   ->B B   ->C C   ->D

$drv
[1] 1 2 3 4 5 6 7 8

$rT
  parent child    s   sh typeDep
1   Root     A 0.60 -0.5      MN
2      A     B 0.40 -0.5      MN
3      B     C 0.10 -0.5      MN
4      C     D 0.05 -0.5      MN

$epistasis
NULL

$orderEffects
NULL

$noIntGenes
NULL

$fitnessLandscape
     [,1]

$fitnessLandscape_df
data frame with 0 columns and 0 rows

$fitnessLandscape_gene_id
data frame with 0 columns and 0 rows

$fitnessLandscapeVariables
character(0)

$frequencyDependentFitness
[1] FALSE

$frequencyType
[1] "freq_dep_not_used"

attr(,"class")
[1] "fitnessEffects"

DAG

In [6]:
# DAG representation
plot(CRC_W, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [13]:
## Map genotypes to fitness
CRC_F <- evalAllGenotypes(CRC_W, order = FALSE, addwt = TRUE)

## Plot of fitness landscape

plot(CRC_F)

Simplified model

In [14]:
## Simplified model
## Define poset restrictions, mapping of genes to modules, and driver genes
(CRC_W2 <- allFitnessEffects(data.frame(parent = c(rep("Root", 2), "A", "B", "C"),
                                      child = c("A", "B", rep("C", 2), "D"),
                                      s = c(0.2, 0.1, rep(0.05, 2), 0.01),
                                      sh = -0.5,
                                      typeDep = c(rep("XMPN", 4), "MN")), 
                           geneToModule = c("Root" = "Root",
                                            "A" = "APC",
                                            "B" = "TP53", 
                                            "C" = "KRAS",
                                            "D" = "FBXW7"),
                           drvNames = c("APC", "TP53", "KRAS", "FBXW7")))
$long.rt
$long.rt$A
$long.rt$A$child
[1] "A"

$long.rt$A$s
[1] 0.2

$long.rt$A$sh
[1] -0.5

$long.rt$A$typeDep
  XMPN 
"xmpn" 

$long.rt$A$parents
[1] "Root"

$long.rt$A$childNumID
A 
1 

$long.rt$A$parentsNumID
Root 
   0 


$long.rt$B
$long.rt$B$child
[1] "B"

$long.rt$B$s
[1] 0.1

$long.rt$B$sh
[1] -0.5

$long.rt$B$typeDep
  XMPN 
"xmpn" 

$long.rt$B$parents
[1] "Root"

$long.rt$B$childNumID
B 
2 

$long.rt$B$parentsNumID
Root 
   0 


$long.rt$C
$long.rt$C$child
[1] "C"

$long.rt$C$s
[1] 0.05

$long.rt$C$sh
[1] -0.5

$long.rt$C$typeDep
  XMPN 
"xmpn" 

$long.rt$C$parents
[1] "A" "B"

$long.rt$C$childNumID
C 
3 

$long.rt$C$parentsNumID
A B 
1 2 


$long.rt$D
$long.rt$D$child
[1] "D"

$long.rt$D$s
[1] 0.01

$long.rt$D$sh
[1] -0.5

$long.rt$D$typeDep
        MN 
"monotone" 

$long.rt$D$parents
[1] "C"

$long.rt$D$childNumID
D 
4 

$long.rt$D$parentsNumID
C 
3 



$long.epistasis
list()

$long.orderEffects
list()

$long.geneNoInt
data frame with 0 columns and 0 rows

$geneModule
   Gene Module GeneNumID ModuleNumID
1  Root   Root         0           0
2   APC      A         1           1
3 FBXW7      D         2           4
4  KRAS      C         3           3
5  TP53      B         4           2

$gMOneToOne
[1] FALSE

$geneToModule
   Root       A       B       C       D 
 "Root"   "APC"  "TP53"  "KRAS" "FBXW7" 

$graph
IGRAPH 440feb2 DN-- 5 5 -- 
+ attr: name (v/c), typeDep (e/c), color (e/c), lty (e/n), arrow.mode
| (e/n)
+ edges from 440feb2 (vertex names):
[1] Root->A Root->B A   ->C B   ->C C   ->D

$drv
[1] 1 2 3 4

$rT
  parent child    s   sh typeDep
1   Root     A 0.20 -0.5    XMPN
2   Root     B 0.10 -0.5    XMPN
3      A     C 0.05 -0.5    XMPN
4      B     C 0.05 -0.5    XMPN
5      C     D 0.01 -0.5      MN

$epistasis
NULL

$orderEffects
NULL

$noIntGenes
NULL

$fitnessLandscape
     [,1]

$fitnessLandscape_df
data frame with 0 columns and 0 rows

$fitnessLandscape_gene_id
data frame with 0 columns and 0 rows

$fitnessLandscapeVariables
character(0)

$frequencyDependentFitness
[1] FALSE

$frequencyType
[1] "freq_dep_not_used"

attr(,"class")
[1] "fitnessEffects"
In [3]:
# DAG representation
plot(CRC_W2, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [15]:
## Simplified Model
## Map genotypes to fitness
(CRC_F2 <- evalAllGenotypes(CRC_W2, order = FALSE, addwt = TRUE))
A evalAllGenotypes: 16 × 2
GenotypeFitness
<chr><dbl>
WT 1.00000
APC 1.20000
FBXW7 0.50000
KRAS 0.50000
TP53 1.10000
APC, FBXW7 0.60000
APC, KRAS 1.26000
APC, TP53 1.32000
FBXW7, KRAS 0.50500
FBXW7, TP53 0.55000
KRAS, TP53 1.15500
APC, FBXW7, KRAS 1.27260
APC, FBXW7, TP53 0.66000
APC, KRAS, TP53 0.66000
FBXW7, KRAS, TP53 1.16655
APC, FBXW7, KRAS, TP530.66660

Map Genotypes to Fitness (cont.)

In [16]:
## Plot of fitness landscape

plot(CRC_F2, use_ggrepel = TRUE)

Synthetic Lethality via Three-way Interaction

  • Restrictions imposed with XOR

  • Fitness values that reflect slightly deleterious effect (if two genes appear)

  • or a highly deleterious effect (if three genes appear).

In [17]:
## Simplified model
## Define poset restrictions, mapping of genes to modules, and driver genes
(CRC_W4 <- allFitnessEffects(data.frame(parent = c(rep("Root", 2), "A", "B", "C"),
                                      child = c("A", "B", rep("C", 2), "D"),
                                      s = c(0.2, 0.1, rep(0.05, 2), 0.01),
                                      sh = -0.5,
                                      typeDep = c(rep("XMPN", 4), "MN")), 
                            epistasis = c("A : -B : -C" = 0.2,
                                          "-A : B : -C" = 0.1,
                                          "-A : -B : C" = 0.05,
                                          "A : B : -C" = 0.01,
                                          "-A : B : C" = 0.02,
                                          "-B : A : C" = 0.02,
                                          "A : B : C" = -0.5),
                           geneToModule = c("Root" = "Root",
                                            "A" = "APC",
                                            "B" = "TP53", 
                                            "C" = "KRAS",
                                            "D" = "FBXW7"),
                           drvNames = c("APC", "TP53", "KRAS", "FBXW7")))
$long.rt
$long.rt$A
$long.rt$A$child
[1] "A"

$long.rt$A$s
[1] 0.2

$long.rt$A$sh
[1] -0.5

$long.rt$A$typeDep
  XMPN 
"xmpn" 

$long.rt$A$parents
[1] "Root"

$long.rt$A$childNumID
A 
1 

$long.rt$A$parentsNumID
Root 
   0 


$long.rt$B
$long.rt$B$child
[1] "B"

$long.rt$B$s
[1] 0.1

$long.rt$B$sh
[1] -0.5

$long.rt$B$typeDep
  XMPN 
"xmpn" 

$long.rt$B$parents
[1] "Root"

$long.rt$B$childNumID
B 
2 

$long.rt$B$parentsNumID
Root 
   0 


$long.rt$C
$long.rt$C$child
[1] "C"

$long.rt$C$s
[1] 0.05

$long.rt$C$sh
[1] -0.5

$long.rt$C$typeDep
  XMPN 
"xmpn" 

$long.rt$C$parents
[1] "A" "B"

$long.rt$C$childNumID
C 
3 

$long.rt$C$parentsNumID
A B 
1 2 


$long.rt$D
$long.rt$D$child
[1] "D"

$long.rt$D$s
[1] 0.01

$long.rt$D$sh
[1] -0.5

$long.rt$D$typeDep
        MN 
"monotone" 

$long.rt$D$parents
[1] "C"

$long.rt$D$childNumID
D 
4 

$long.rt$D$parentsNumID
C 
3 



$long.epistasis
$long.epistasis[[1]]
$long.epistasis[[1]]$ids
[1] "-C" "-B" "A" 

$long.epistasis[[1]]$s
[1] 0.2

$long.epistasis[[1]]$NumID
 C  B  A 
-3 -2  1 


$long.epistasis[[2]]
$long.epistasis[[2]]$ids
[1] "-C" "-A" "B" 

$long.epistasis[[2]]$s
[1] 0.1

$long.epistasis[[2]]$NumID
 C  A  B 
-3 -1  2 


$long.epistasis[[3]]
$long.epistasis[[3]]$ids
[1] "-B" "-A" "C" 

$long.epistasis[[3]]$s
[1] 0.05

$long.epistasis[[3]]$NumID
 B  A  C 
-2 -1  3 


$long.epistasis[[4]]
$long.epistasis[[4]]$ids
[1] "-C" "A"  "B" 

$long.epistasis[[4]]$s
[1] 0.01

$long.epistasis[[4]]$NumID
 C  A  B 
-3  1  2 


$long.epistasis[[5]]
$long.epistasis[[5]]$ids
[1] "-A" "B"  "C" 

$long.epistasis[[5]]$s
[1] 0.02

$long.epistasis[[5]]$NumID
 A  B  C 
-1  2  3 


$long.epistasis[[6]]
$long.epistasis[[6]]$ids
[1] "-B" "A"  "C" 

$long.epistasis[[6]]$s
[1] 0.02

$long.epistasis[[6]]$NumID
 B  A  C 
-2  1  3 


$long.epistasis[[7]]
$long.epistasis[[7]]$ids
[1] "A" "B" "C"

$long.epistasis[[7]]$s
[1] -0.5

$long.epistasis[[7]]$NumID
A B C 
1 2 3 



$long.orderEffects
list()

$long.geneNoInt
data frame with 0 columns and 0 rows

$geneModule
   Gene Module GeneNumID ModuleNumID
1  Root   Root         0           0
2   APC      A         1           1
3 FBXW7      D         2           4
4  KRAS      C         3           3
5  TP53      B         4           2

$gMOneToOne
[1] FALSE

$geneToModule
   Root       A       B       C       D 
 "Root"   "APC"  "TP53"  "KRAS" "FBXW7" 

$graph
IGRAPH b304fee DN-- 5 8 -- 
+ attr: name (v/c), typeDep (e/c), color (e/c), lty (e/n), arrow.mode
| (e/n)
+ edges from b304fee (vertex names):
[1] Root->A Root->B A   ->C B   ->C C   ->D A   ->B A   ->C B   ->C

$drv
[1] 1 2 3 4

$rT
  parent child    s   sh typeDep
1   Root     A 0.20 -0.5    XMPN
2   Root     B 0.10 -0.5    XMPN
3      A     C 0.05 -0.5    XMPN
4      B     C 0.05 -0.5    XMPN
5      C     D 0.01 -0.5      MN

$epistasis
A : -B : -C -A : B : -C -A : -B : C  A : B : -C  -A : B : C  -B : A : C 
       0.20        0.10        0.05        0.01        0.02        0.02 
  A : B : C 
      -0.50 

$orderEffects
NULL

$noIntGenes
NULL

$fitnessLandscape
     [,1]

$fitnessLandscape_df
data frame with 0 columns and 0 rows

$fitnessLandscape_gene_id
data frame with 0 columns and 0 rows

$fitnessLandscapeVariables
character(0)

$frequencyDependentFitness
[1] FALSE

$frequencyType
[1] "freq_dep_not_used"

attr(,"class")
[1] "fitnessEffects"
In [3]:
# DAG representation
plot(CRC_W4, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [18]:
## Map genotypes to fitness
CRC_F2 <- evalAllGenotypes(CRC_W4, order = FALSE, addwt = TRUE)

(CRC_F2)
A evalAllGenotypes: 16 × 2
GenotypeFitness
<chr><dbl>
WT 1.000000
APC 1.440000
FBXW7 0.500000
KRAS 0.525000
TP53 1.210000
APC, FBXW7 0.720000
APC, KRAS 1.285200
APC, TP53 1.333200
FBXW7, KRAS 0.530250
FBXW7, TP53 0.605000
KRAS, TP53 1.178100
APC, FBXW7, KRAS 1.298052
APC, FBXW7, TP53 0.666600
APC, KRAS, TP53 0.330000
FBXW7, KRAS, TP53 1.189881
APC, FBXW7, KRAS, TP530.333300

Map Genotypes to Fitness (cont.)

In [19]:
## Plot of fitness landscape

plot(CRC_F2, use_ggrepel = TRUE)

Synthetic Viability with a Three-way Interaction

  • Highly deleterious effects if APC, TP53, or KRAS appear independently

  • Slightly deleterious effects were set if two of those genes appear in a genotype

  • Benefitial effect for the genotype of the three genes

  • Restrictions imposed with SM

In [20]:
## Simplified model
## SM because synthetic viability requires both parent nodes. 
## Define poset restrictions, mapping of genes to modules, and driver genes
(CRC_W6 <- allFitnessEffects(data.frame(parent = c(rep("Root", 2), "A", "B", "C"),
                                      child = c("A", "B", rep("C", 2), "D"),
                                      s = c(0.2, 0.1, rep(0.05, 2), 0.01),
                                      sh = -0.5,
                                      typeDep = c(rep("MN", 5))), 
                            epistasis = c("A : -B : -C" = -0.2,
                                          "-A : B : -C" = -0.2,
                                          "-A : -B : C" = -0.3,
                                          "A : B : -C" = -0.05,
                                          "-A : B : C" = -0.01,
                                          "A : -B : C" = -0.01,
                                          "A : B : C" = 0.5),
                           geneToModule = c("Root" = "Root",
                                            "A" = "APC",
                                            "B" = "TP53", 
                                            "C" = "KRAS",
                                            "D" = "FBXW7"),
                           drvNames = c("APC", "TP53", "KRAS", "FBXW7")))
$long.rt
$long.rt$A
$long.rt$A$child
[1] "A"

$long.rt$A$s
[1] 0.2

$long.rt$A$sh
[1] -0.5

$long.rt$A$typeDep
        MN 
"monotone" 

$long.rt$A$parents
[1] "Root"

$long.rt$A$childNumID
A 
1 

$long.rt$A$parentsNumID
Root 
   0 


$long.rt$B
$long.rt$B$child
[1] "B"

$long.rt$B$s
[1] 0.1

$long.rt$B$sh
[1] -0.5

$long.rt$B$typeDep
        MN 
"monotone" 

$long.rt$B$parents
[1] "Root"

$long.rt$B$childNumID
B 
2 

$long.rt$B$parentsNumID
Root 
   0 


$long.rt$C
$long.rt$C$child
[1] "C"

$long.rt$C$s
[1] 0.05

$long.rt$C$sh
[1] -0.5

$long.rt$C$typeDep
        MN 
"monotone" 

$long.rt$C$parents
[1] "A" "B"

$long.rt$C$childNumID
C 
3 

$long.rt$C$parentsNumID
A B 
1 2 


$long.rt$D
$long.rt$D$child
[1] "D"

$long.rt$D$s
[1] 0.01

$long.rt$D$sh
[1] -0.5

$long.rt$D$typeDep
        MN 
"monotone" 

$long.rt$D$parents
[1] "C"

$long.rt$D$childNumID
D 
4 

$long.rt$D$parentsNumID
C 
3 



$long.epistasis
$long.epistasis[[1]]
$long.epistasis[[1]]$ids
[1] "-C" "-B" "A" 

$long.epistasis[[1]]$s
[1] -0.2

$long.epistasis[[1]]$NumID
 C  B  A 
-3 -2  1 


$long.epistasis[[2]]
$long.epistasis[[2]]$ids
[1] "-C" "-A" "B" 

$long.epistasis[[2]]$s
[1] -0.2

$long.epistasis[[2]]$NumID
 C  A  B 
-3 -1  2 


$long.epistasis[[3]]
$long.epistasis[[3]]$ids
[1] "-B" "-A" "C" 

$long.epistasis[[3]]$s
[1] -0.3

$long.epistasis[[3]]$NumID
 B  A  C 
-2 -1  3 


$long.epistasis[[4]]
$long.epistasis[[4]]$ids
[1] "-C" "A"  "B" 

$long.epistasis[[4]]$s
[1] -0.05

$long.epistasis[[4]]$NumID
 C  A  B 
-3  1  2 


$long.epistasis[[5]]
$long.epistasis[[5]]$ids
[1] "-A" "B"  "C" 

$long.epistasis[[5]]$s
[1] -0.01

$long.epistasis[[5]]$NumID
 A  B  C 
-1  2  3 


$long.epistasis[[6]]
$long.epistasis[[6]]$ids
[1] "-B" "A"  "C" 

$long.epistasis[[6]]$s
[1] -0.01

$long.epistasis[[6]]$NumID
 B  A  C 
-2  1  3 


$long.epistasis[[7]]
$long.epistasis[[7]]$ids
[1] "A" "B" "C"

$long.epistasis[[7]]$s
[1] 0.5

$long.epistasis[[7]]$NumID
A B C 
1 2 3 



$long.orderEffects
list()

$long.geneNoInt
data frame with 0 columns and 0 rows

$geneModule
   Gene Module GeneNumID ModuleNumID
1  Root   Root         0           0
2   APC      A         1           1
3 FBXW7      D         2           4
4  KRAS      C         3           3
5  TP53      B         4           2

$gMOneToOne
[1] FALSE

$geneToModule
   Root       A       B       C       D 
 "Root"   "APC"  "TP53"  "KRAS" "FBXW7" 

$graph
IGRAPH f2a87fd DN-- 5 8 -- 
+ attr: name (v/c), typeDep (e/c), color (e/c), lty (e/n), arrow.mode
| (e/n)
+ edges from f2a87fd (vertex names):
[1] Root->A Root->B A   ->C B   ->C C   ->D A   ->B A   ->C B   ->C

$drv
[1] 1 2 3 4

$rT
  parent child    s   sh typeDep
1   Root     A 0.20 -0.5      MN
2   Root     B 0.10 -0.5      MN
3      A     C 0.05 -0.5      MN
4      B     C 0.05 -0.5      MN
5      C     D 0.01 -0.5      MN

$epistasis
A : -B : -C -A : B : -C -A : -B : C  A : B : -C  -A : B : C  A : -B : C 
      -0.20       -0.20       -0.30       -0.05       -0.01       -0.01 
  A : B : C 
       0.50 

$orderEffects
NULL

$noIntGenes
NULL

$fitnessLandscape
     [,1]

$fitnessLandscape_df
data frame with 0 columns and 0 rows

$fitnessLandscape_gene_id
data frame with 0 columns and 0 rows

$fitnessLandscapeVariables
character(0)

$frequencyDependentFitness
[1] FALSE

$frequencyType
[1] "freq_dep_not_used"

attr(,"class")
[1] "fitnessEffects"
In [3]:
# DAG representation
plot(CRC_W6, expandModules = TRUE, autofit = TRUE, lwdf = 2)

Map Genotypes to Fitness

In [21]:
## Map genotypes to fitness
CRC_F4 <- evalAllGenotypes(CRC_W6, order = FALSE, addwt = TRUE)
(CRC_F4)
A evalAllGenotypes: 16 × 2
GenotypeFitness
<chr><dbl>
WT 1.000000
APC 0.960000
FBXW7 0.500000
KRAS 0.350000
TP53 0.880000
APC, FBXW7 0.480000
APC, KRAS 0.594000
APC, TP53 1.254000
FBXW7, KRAS 0.353500
FBXW7, TP53 0.440000
KRAS, TP53 0.544500
APC, FBXW7, KRAS 0.599940
APC, FBXW7, TP53 0.627000
APC, KRAS, TP53 2.079000
FBXW7, KRAS, TP53 0.549945
APC, FBXW7, KRAS, TP532.099790

Map Genotypes to Fitness (cont.)

In [22]:
## Plot of fitness landscape

plot(CRC_F4, use_ggrepel = TRUE)

A Probabilistic Model of Mutually Exclusive Linearly Ordered Driver Pathways (Mohaghegh Neyshabouri et al.)

This model assumes driver genes are over-represented among those mutated across a large tumor collection and, thus, they can be identified in terms of frequency. Also, those participating of the same pathway are mutated in a mutually exclusive manner because more than one mutation in a pathway does not give any selective advantage to the clone. analyze one large dataset of colorectal adenocarcinoma (COADREAD) from IntOGen-mutations database.

Model

In [23]:
## Restriction table, including DAG of restrictions specifications and associated fitness
COADREAD_rT <- data.frame(
              parent = c("Root", "A", "B", "C", "D", "E", "F"), # Parent nodes
              child = c("A", "B", "C", "D", "E", "F", "G"), # Child nodes
                             s = 0.5, 
                             sh = c(rep(-1, 4), rep(-.5, 2), -.2),
                             typeDep = "MN")


## Create fitness specifications from DAG of restrictions considering modules 
COADREAD_fitness <- allFitnessEffects(
              COADREAD_rT, 
              geneToModule = c( 
                 "Root" = "Root",
                 "A" = "APC",
                 "B" = "TP53",
                 "C" = "KRAS",
                 "D" = "PIK3CA, NRAS",
                 "E" = "FBXW7, ARID1A",
                 "F" = "ATM, SMAD2",
                 "G" = "SOX9, SMAD4")) # Modules
In [3]:
## DAG of restrictions representation
plot(COADREAD_fitness, expandModules = TRUE, autofit = TRUE)

Map Genotypes to Fitness

In [24]:
## Evaluation of all possible genotypes fitness 
## under the previous fitness specifications
COADREAD_FL <- evalAllGenotypes(COADREAD_fitness, max = 131072)

## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL)

Simplified Model

In [26]:
## Restriction table, including five-modules DAG of
## restrictions specifications and associated fitness
COADREAD_rT_5d <- data.frame(parent = c("Root", "A", "B", "C"), # Parent nodes
                             child = c("A", "B", "C", "D"), # Child nodes
                             s = 0.5, 
                             sh = c(rep(-1, 4)),
                             typeDep = "MN")

## Create fitness specifications from simplified DAG of restrictions
COADREAD_fitness_5d <- allFitnessEffects(COADREAD_rT_5d,
                                         geneToModule = c( "Root" = "Root",
                                                           "A" = "APC",
                                                           "B" = "TP53",
                                                           "C" = "KRAS",
                                                           "D" = "PIK3CA, NRAS"),
                                         drvNames = c("APC", "TP53", "KRAS",
                                                      "PIK3CA", "NRAS"))
In [3]:
## Simplified DAG of restrictions representation
plot(COADREAD_fitness_5d, expandModules = TRUE, autofit = TRUE)

Map Genotypes to Fitness

In [27]:
## Evaluation of all possible genotypes fitness under the previous fitness specifications
(COADREAD_FL_5d <- evalAllGenotypes(COADREAD_fitness_5d))
A evalAllGenotypes: 31 × 2
GenotypeFitness
<chr><dbl>
APC 1.5000
KRAS 0.0000
NRAS 0.0000
PIK3CA 0.0000
TP53 0.0000
APC, KRAS 0.0000
APC, NRAS 0.0000
APC, PIK3CA 0.0000
APC, TP53 2.2500
KRAS, NRAS 0.0000
KRAS, PIK3CA 0.0000
KRAS, TP53 0.0000
NRAS, PIK3CA 0.0000
NRAS, TP53 0.0000
PIK3CA, TP53 0.0000
APC, KRAS, NRAS 0.0000
APC, KRAS, PIK3CA 0.0000
APC, KRAS, TP53 3.3750
APC, NRAS, PIK3CA 0.0000
APC, NRAS, TP53 0.0000
APC, PIK3CA, TP53 0.0000
KRAS, NRAS, PIK3CA 0.0000
KRAS, NRAS, TP53 0.0000
KRAS, PIK3CA, TP53 0.0000
NRAS, PIK3CA, TP53 0.0000
APC, KRAS, NRAS, PIK3CA 0.0000
APC, KRAS, NRAS, TP53 5.0625
APC, KRAS, PIK3CA, TP53 5.0625
APC, NRAS, PIK3CA, TP53 0.0000
KRAS, NRAS, PIK3CA, TP53 0.0000
APC, KRAS, NRAS, PIK3CA, TP535.0625

Map Genotypes to Fitness

In [28]:
## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL_5d, use_ggrepel = TRUE)
Warning message:
“ggrepel: 21 unlabeled data points (too many overlaps). Consider increasing max.overlaps”

Frequency-dependent fitness

Clones that coexist in a tumor can influence the fitness of each other in a frequency-dependent manner.

Frequency-dependent fitness (cont.)

In [29]:
## Mapping of genotypes to frequency-dependent fitness
# Not explicitly mapped genotypes are assigned a fitness of cero
COADREAD_gen_freqdep <- data.frame(
                Genotype = c("WT", "APC","APC, TP53",
                            "APC, TP53, KRAS",
                            "APC, TP53, KRAS, PIK3CA",
                            "APC, TP53, KRAS, NRAS",
                            "APC, TP53, KRAS, PIK3CA, NRAS"),
                            Fitness = c("1", "1.5",
                                        "2.25", "3.375", "5.0625", "5.0625",
              "5.0625 - ((f_APC_TP53_KRAS_PIK3CA + f_APC_TP53_KRAS_NRAS))/2"),
                            stringsAsFactors = FALSE)

## Fitness specifications
COADREAD_fitness_freqdep <- allFitnessEffects(genotFitness = COADREAD_gen_freqdep, 
                                                   frequencyDependentFitness = TRUE,
                                                   frequencyType = "rel")

Map Genotypes to Fitness

In [30]:
## Evaluate all genotypes considering population sizes of the clones
(COADREAD_FL_freqdep <- evalAllGenotypes(COADREAD_fitness_freqdep,
                                 spPopSizes = c("WT" = 5, "APC" = 5, "APC, TP53" = 5,
                                                "APC, TP53, KRAS" = 10,
                                                "APC, TP53, KRAS, PIK3CA" = 50,
                                                "APC, TP53, KRAS, NRAS" = 50,
                                                "APC, TP53, KRAS, PIK3CA, NRAS" = 50)))
A evalAllGenotypes: 32 × 2
GenotypeFitness
<chr><dbl>
WT 1.000000
APC 1.500000
KRAS 0.000000
NRAS 0.000000
PIK3CA 0.000000
TP53 0.000000
APC, KRAS 0.000000
APC, NRAS 0.000000
APC, PIK3CA 0.000000
APC, TP53 2.250000
KRAS, NRAS 0.000000
KRAS, PIK3CA 0.000000
KRAS, TP53 0.000000
NRAS, PIK3CA 0.000000
NRAS, TP53 0.000000
PIK3CA, TP53 0.000000
APC, KRAS, NRAS 0.000000
APC, KRAS, PIK3CA 0.000000
APC, KRAS, TP53 3.375000
APC, NRAS, PIK3CA 0.000000
APC, NRAS, TP53 0.000000
APC, PIK3CA, TP53 0.000000
KRAS, NRAS, PIK3CA 0.000000
KRAS, NRAS, TP53 0.000000
KRAS, PIK3CA, TP53 0.000000
NRAS, PIK3CA, TP53 0.000000
APC, KRAS, NRAS, PIK3CA 0.000000
APC, KRAS, NRAS, TP53 5.062500
APC, KRAS, PIK3CA, TP53 5.062500
APC, NRAS, PIK3CA, TP53 0.000000
KRAS, NRAS, PIK3CA, TP53 0.000000
APC, KRAS, NRAS, PIK3CA, TP534.776786

Map Genotypes to Fitness (cont.)

In [31]:
## Fitness landscape representation
plotFitnessLandscape(COADREAD_FL_freqdep)

Conclusions

  • Fitness landscapes include all possible paths of tumor progression and enable the study of different outcomes depending on the selected evolutionary model

  • We have mapped various cancer progression models (mutual exclusivity assumptions) to different evolutionary models and analyzed the resulting fitness landscapes

  • All this evidence supports the idea that cancer progression is better explained through an evolutionary perspective

  • Current cancer progression models cannot capture these phenomena and, thus, conclusions derived from them should be questioned

  • OncoSimulR is a very complete and convenient clone-based genetic simulator to evaluate cancer progression under evolutionary models, but is limited to:

    • Small sets of genes for fitness landscapes interpretation.
    • No fitness landscape for studying order of effects.
    • Modules do not allow to define epistatic relationships between genes of the same module.

References

  1. Raphael BJ, Vandin F. Simultaneous Inference of Cancer Pathways and Tumor Progression from Cross-Sectional Mutation Data. Journal of Computational Biology. 2015;22(6):510–27. doi: 10.1089/cmb.2014.0161
  2. Schill R, Solbrig S, Wettig T, Spang R. Modelling cancer progression using Mutual Hazard Networks. Bioinformatics. 2020;36(1):241–9. doi: 10.1093/bioinformatics/btz513
  3. Sprouffske K, Pepper JW, Maley CC. Accurate reconstruction of the temporal order of mutations in neoplastic progression. Cancer Prevention Research. 2011;4(7):1135–44. doi: 10.1158/1940-6207.CAPR-10- 0374
  4. Pon JR, Marra MA. Driver and passenger mutations in cancer. Annual Review of Pathology: Mechanisms of Disease. 2015; doi: 10.1146/annurev-pathol-012414-040312
  5. Diaz-Uriarte R. Cancer progression models and fitness landscapes: A many-to-many relationship. Bioinformatics. 2018;34(5):836–44. doi: 10.1093/bioinformatics/btx663
  6. Tomasetti C, Vogelstein B, Parmigiani G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proceedings of the National Academy of Sciences of the United States of America. 2013; doi: 10.1073/pnas.1221068110
  7. Nowell PC. The clonal evolution of tumor cell populations. Science. 1976; doi: 10.1126/science.959840
  8. Lee EYHP, Muller WJ. Oncogenes and tumor suppressor genes. 2010. doi: 10.1101/cshperspect.a003236
  9. Diaz-Uriarte R. Identifying restrictions in the order of accumulation of mutations during tumor progression: Effects of passengers, evolutionary models, and sampling. BMC Bioinformatics. 2015;16(1):1–26. doi: 10.1186/s12859-015-0466-7
  10. Cristea S, Kuipers J, Beerenwinkel N. PathTiMEx: Joint Inference of Mutually Exclusive Cancer Pathways and Their Progression Dynamics. Journal of Computational Biology. 2017;24(6):603–15. doi: 10.1089/cmb.2016.0171
  11. Neyshabouri MM, Jun SH, Lagergren J. Inferring tumor progression in large datasets. PLoS Computational Biology. 2020;16(10):1–16. Available from: http://dx.doi.org/10.1371/journal.pcbi.1008183
  12. Ortmann CA, Kent DG, Nangalia J, Silber Y, Wedge DC, Grinfeld J, et al. Effect of Mutation Order on Myeloproliferative Neoplasms. New England Journal of Medicine. 2015 Feb;372(7):601–12. [accessed 20 Jan 2021] Available from: https://doi.org/10.1056/NEJMoa1412098
  13. Diaz-Uriarte R. OncoSimulR: Genetic simulation with arbitrary epistasis and mutator genes in asexual populations. Bioinformatics. 2017;33(12):1898–9. doi: 10.1093/bioinformatics/btx077
  14. Wang X, Fu AQ, McNerney ME, White KP. Widespread genetic epistasis among cancer genes. Nature Communications. 2014 Nov;5(1):4828. [accessed 20 Jan 2021] Available from: https://www.nature.com/ articles/ncomms5828
  15. Haar J van de, Canisius S, Yu MK, Voest EE, Wessels LFA, Ideker T. Identifying Epistasis in Cancer Genomes: A Delicate Affair. Cell. 2019 May;177(6):1375–83. [accessed 20 Jan 2021] Available from: http: //www.sciencedirect.com/science/article/pii/S0092867419305033
  16. Reia SM, Campos PRA. Analysis of statistical correlations between properties of adaptive walks in fitness landscapes. Royal Society Open Science. 7(1):192118. [accessed 20 Jan 2021] Available from: https: //royalsocietypublishing.org/doi/10.1098/rsos.192118
  17. Gu Y, Wang R, Han Y, Zhou W, Zhao Z, Chen T, et al. A landscape of synthetic viable interactions in cancer. Briefings in Bioinformatics. 2018 Jul;19(4):644–55. [accessed 20 Jan 2021] Available from: https://doi.org/10.1093/bib/bbw142
  18. Archetti M, Pienta KJ. Cooperation among cancer cells: Applying game theory to cancer. Nature Reviews Cancer. 2019;19(2):110–7.
  19. Gerstung M, Eriksson N, Lin J, Vogelstein B, Beerenwinkel N. The temporal order of genetic and pathway alterations in tumorigenesis. PLoS ONE. 2011;6(10). doi: 10.1371/journal.pone.0027136
  20. Ciriello G, Cerami E, Sander C, Schultz N. Mutual exclusivity analysis identifies oncogenic network modules. Genome Research. 2012; doi: 10.1101/gr.125567.111
  21. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, et al. The genomic landscapes of human breast and colorectal cancers. Science. 2007; doi: 10.1126/science.1145720
  22. Beerenwinkel N, Antal T, Dingli D, Traulsen A, Kinzler KW, Velculescu VE, et al. Genetic progression and the waiting time to cancer. PLoS Computational Biology. 2007; doi: 10.1371/journal.pcbi.0030225
  23. Yeang C-H, McCormick F, Levine A. Combinatorial patterns of somatic gene mutations in cancer. The FASEB Journal. 2008; doi: 10.1096/fj.08-108985